智能论文笔记

Deep Learning for Multi-User MIMO Systems: Joint Design of Pilot, Limited Feedback, and Precoding

Jeonghyeon Jang , Hoon Lee , Il-Min Kim , Inkyu Lee

分类：机器学习

2022-09-21

在带有频划分双链体（FDD）的常规多用户多用户多输入多输出（MU-MIMO）系统中，尽管高度耦合，但已单独设计了通道采集和预编码器优化过程。本文研究了下行链路MU-MIMO系统的端到端设计，其中包括试点序列，有限的反馈和预编码。为了解决这个问题，我们提出了一个新颖的深度学习（DL）框架，该框架共同优化了用户的反馈信息生成和基础站（BS）的预编码器设计。 MU-MIMO系统中的每个过程都被智能设计的多个深神经网络（DNN）单元所取代。在BS上，神经网络生成试验序列，并帮助用户获得准确的频道状态信息。在每个用户中，频道反馈操作是由单个用户DNN以分布方式进行的。然后，另一个BS DNN从用户那里收集反馈信息，并确定MIMO预编码矩阵。提出了联合培训算法以端到端的方式优化所有DNN单元。此外，还提出了一种可以避免针对可扩展设计的不同网络大小进行重新训练的培训策略。数值结果证明了与经典优化技术和其他常规DNN方案相比，提出的DL框架的有效性。

translated by 谷歌翻译

Investigation of Network Architecture for Multimodal Head-and-Neck Tumor Segmentation

Ye Li , Junyu Chen , Se-in Jang , Kuang Gong , Quanzheng Li

分类：计算机视觉

2022-12-21

Inspired by the recent success of Transformers for Natural Language Processing and vision Transformer for Computer Vision, many researchers in the medical imaging community have flocked to Transformer-based networks for various main stream medical tasks such as classification, segmentation, and estimation. In this study, we analyze, two recently published Transformer-based network architectures for the task of multimodal head-and-tumor segmentation and compare their performance to the de facto standard 3D segmentation network - the nnU-Net. Our results showed that modeling long-range dependencies may be helpful in cases where large structures are present and/or large field of view is needed. However, for small structures such as head-and-neck tumor, the convolution-based U-Net architecture seemed to perform well, especially when training dataset is small and computational resource is limited.

translated by 谷歌翻译

DAG: Depth-Aware Guidance with Denoising Diffusion Probabilistic Models

Gyeongnyeon Kim , Wooseok Jang , Gyuseong Lee , Susung Hong , Junyoung Seo , Seungryong Kim

分类：计算机视觉

2022-12-17

In recent years, generative models have undergone significant advancement due to the success of diffusion models. The success of these models is often attributed to their use of guidance techniques, such as classifier and classifier-free methods, which provides effective mechanisms to trade-off between fidelity and diversity. However, these methods are not capable of guiding a generated image to be aware of its geometric configuration, e.g., depth, which hinders the application of diffusion models to areas that require a certain level of depth awareness. To address this limitation, we propose a novel guidance approach for diffusion models that uses estimated depth information derived from the rich intermediate representations of diffusion models. To do this, we first present a label-efficient depth estimation framework using the internal representations of diffusion models. At the sampling phase, we utilize two guidance techniques to self-condition the generated image using the estimated depth map, the first of which uses pseudo-labeling, and the subsequent one uses a depth-domain diffusion prior. Experiments and extensive ablation studies demonstrate the effectiveness of our method in guiding the diffusion models toward geometrically plausible image generation. Project page is available at https://ku-cvlab.github.io/DAG/.

translated by 谷歌翻译

Accurate Open-set Recognition for Memory Workload

Jun-Gi Jang , Sooyeon Shim , Vladimir Egay , Jeeyong Lee , Jongmin Park , Suhyun Chae , U Kang

分类：人工智能

2022-12-17

How can we accurately identify new memory workloads while classifying known memory workloads? Verifying DRAM (Dynamic Random Access Memory) using various workloads is an important task to guarantee the quality of DRAM. A crucial component in the process is open-set recognition which aims to detect new workloads not seen in the training phase. Despite its importance, however, existing open-set recognition methods are unsatisfactory in terms of accuracy since they fail to exploit the characteristics of workload sequences. In this paper, we propose Acorn, an accurate open-set recognition method capturing the characteristics of workload sequences. Acorn extracts two types of feature vectors to capture sequential patterns and spatial locality patterns in memory access. Acorn then uses the feature vectors to accurately classify a subsequence into one of the known classes or identify it as the unknown class. Experiments show that Acorn achieves state-of-the-art accuracy, giving up to 37% points higher unknown class detection accuracy while achieving comparable known class classification accuracy than existing methods.

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Can We Find Strong Lottery Tickets in Generative Models?

Sangyeop Yeo , Yoojin Jang , Jy-yong Sohn , Dongyoon Han , Jaejun Yoo

分类：计算机视觉 | 机器学习

2022-12-16

Yes. In this paper, we investigate strong lottery tickets in generative models, the subnetworks that achieve good generative performance without any weight update. Neural network pruning is considered the main cornerstone of model compression for reducing the costs of computation and memory. Unfortunately, pruning a generative model has not been extensively explored, and all existing pruning algorithms suffer from excessive weight-training costs, performance degradation, limited generalizability, or complicated training. To address these problems, we propose to find a strong lottery ticket via moment-matching scores. Our experimental results show that the discovered subnetwork can perform similarly or better than the trained dense model even when only 10% of the weights remain. To the best of our knowledge, we are the first to show the existence of strong lottery tickets in generative models and provide an algorithm to find it stably. Our code and supplementary materials are publicly available.

translated by 谷歌翻译

Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders

Jongseong Jang , Daeun Kyung , Seung Hwan Kim , Honglak Lee , Kyunghoon Bae , Edward Choi

分类：机器学习 | 计算机视觉

2022-12-14

Deep neural networks have been successfully adopted to diverse domains including pathology classification based on medical images. However, large-scale and high-quality data to train powerful neural networks are rare in the medical domain as the labeling must be done by qualified experts. Researchers recently tackled this problem with some success by taking advantage of models pre-trained on large-scale general domain data. Specifically, researchers took contrastive image-text encoders (e.g., CLIP) and fine-tuned it with chest X-ray images and paired reports to perform zero-shot pathology classification, thus completely removing the need for pathology-annotated images to train a classification model. Existing studies, however, fine-tuned the pre-trained model with the same contrastive learning objective, and failed to exploit the multi-labeled nature of medical image-report pairs. In this paper, we propose a new fine-tuning strategy based on sentence sampling and positive-pair loss relaxation for improving the downstream zero-shot pathology classification performance, which can be applied to any pre-trained contrastive image-text encoders. Our method consistently showed dramatically improved zero-shot pathology classification performance on four different chest X-ray datasets and 3 different pre-trained models (5.77% average AUROC increase). In particular, fine-tuning CLIP with our method showed much comparable or marginally outperformed to board-certified radiologists (0.619 vs 0.625 in F1 score and 0.530 vs 0.544 in MCC) in zero-shot classification of five prominent diseases from the CheXpert dataset.

translated by 谷歌翻译

NMS Strikes Back

Jeffrey Ouyang-Zhang , Jang Hyun Cho , Xingyi Zhou , Philipp Krähenbühl

分类：计算机视觉

2022-12-12

Detection Transformer (DETR) directly transforms queries to unique objects by using one-to-one bipartite matching during training and enables end-to-end object detection. Recently, these models have surpassed traditional detectors on COCO with undeniable elegance. However, they differ from traditional detectors in multiple designs, including model architecture and training schedules, and thus the effectiveness of one-to-one matching is not fully understood. In this work, we conduct a strict comparison between the one-to-one Hungarian matching in DETRs and the one-to-many label assignments in traditional detectors with non-maximum supervision (NMS). Surprisingly, we observe one-to-many assignments with NMS consistently outperform standard one-to-one matching under the same setting, with a significant gain of up to 2.5 mAP. Our detector that trains Deformable-DETR with traditional IoU-based label assignment achieved 50.2 COCO mAP within 12 epochs (1x schedule) with ResNet50 backbone, outperforming all existing traditional or transformer-based detectors in this setting. On multiple datasets, schedules, and architectures, we consistently show bipartite matching is unnecessary for performant detection transformers. Furthermore, we attribute the success of detection transformers to their expressive transformer architecture. Code is available at https://github.com/jozhang97/DETA.

translated by 谷歌翻译

Tag Embedding and Well-defined Intermediate Representation improve Auto-Formulation of Problem Description

Sanghwan Jang

分类：自然语言处理

2022-12-07

In this report, I address auto-formulation of problem description, the task of converting an optimization problem into a canonical representation. I first simplify the auto-formulation task by defining an intermediate representation, then introduce entity tag embedding to utilize a given entity tag information. The ablation study demonstrate the effectiveness of the proposed method, which finally took second place in NeurIPS 2022 NL4Opt competition subtask 2.

translated by 谷歌翻译

D-TensoRF: Tensorial Radiance Fields for Dynamic Scenes

Hankyu Jang , Daeyoung Kim

分类：计算机视觉

2022-12-05

Neural radiance field (NeRF) attracts attention as a promising approach to reconstructing the 3D scene. As NeRF emerges, subsequent studies have been conducted to model dynamic scenes, which include motions or topological changes. However, most of them use an additional deformation network, slowing down the training and rendering speed. Tensorial radiance field (TensoRF) recently shows its potential for fast, high-quality reconstruction of static scenes with compact model size. In this paper, we present D-TensoRF, a tensorial radiance field for dynamic scenes, enabling novel view synthesis at a specific time. We consider the radiance field of a dynamic scene as a 5D tensor. The 5D tensor represents a 4D grid in which each axis corresponds to X, Y, Z, and time and has 1D multi-channel features per element. Similar to TensoRF, we decompose the grid either into rank-one vector components (CP decomposition) or low-rank matrix components (newly proposed MM decomposition). We also use smoothing regularization to reflect the relationship between features at different times (temporal dependency). We conduct extensive evaluations to analyze our models. We show that D-TensoRF with CP decomposition and MM decomposition both have short training times and significantly low memory footprints with quantitatively and qualitatively competitive rendering results in comparison to the state-of-the-art methods in 3D dynamic scene modeling.

translated by 谷歌翻译